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Amendments to the Claims 



Status of Claims: 

Claims 1-20 are ponding for examination. 

Claims 21 are added by the present amendment. 

Claims 2, 3 are canceled by the present amendment. 

Claims 1. 7. and 15 are in independent form. 

1. (Currently Amended) A system for automatically determining a language of a 
document rom a set of candidate [[of]] languages, the system comprising: 

a d a tabase containing probability data for a plurality of text strings each having a 
predetermined length equal to each other, each text string of the plurality o f text strings 
having an associated probability value indicating a probability that the t ext string occurs 
within a hin fliifl pe based on occurrences of the text string in all of the candidate 
language^ 

logic for setting a negative assumption value for each of the candidate languages 
indicating die document is not one of the candidate languages; 

an extractor for extracting a character string from the documen t the character 
string having a length equal to the predetermined length of the plurality of te xt strings 
contained n the database : and CO 

a hmguage analyzer for determining a probability value for each o f the candidate ^ 

languages that the character string does not belong to the candidate languages by. <1 

retrieving the probability value associated to the character string from the database for p: 

each or the candidate languages, and includes logic for adjusting the negative assumption g 

value bas-:d on the probability value, the language analyzer determining that the 

document is one language of the candidate languages when the negative assumption ^ 

value pass :s a threshold value, O 

"5 
< 

2. (Cance cd) 
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3. (Cancel :d) 

4. (Original) The system as set forth in claim 1 further including an information 
retrieval engine for retrieving documents in response to a search request, the documents 
retrieved b ring analyzed by the language analyzer. 

5. (Original) The system as set forth in claim 1 wherein the logic for adjusting includes 
logic for combining the negative assumption value with the probability value. 

6. (Origins it) The system as set forth in claim 1 wherein the language analyzer further 
includes iteration logic for causing the extractor to extract another character string from 
the document if the negative assumption value fails to pass the threshold value. 

7. (Currently Amended) A method of determining a language of a document from a 
set of candidate languages, the method comprising the steps of: 

setting a null hypothesis to a true value for each candidate language indicating the 
document is not in the candidate language and setting a false value; 

extracting a text string from the documen t, the text string having a predetermined 

length: 

determining a contrary probability for each candidate language that the text string 
does not belong to the candidate language based on probabilities that the text string 
belongs to each of the candidate languages where the probabilities are retriev ed from a 
database that stores probability values for a plurality of text strings each having the 
predetermined length, each text string of the plurality of text strings having an associated 
probability value for each candidate language indicating a probability that the text string 
occurs within a language from the candidate languages based on occurrences of the text 
string in al l of the candidate languages; 

adj JSting the null hypothesis for each candidate language with the contrary 
probability corresponding to the candidate language; and 
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determining the document is one language from the candidate languages when the 
null hypothesis for the one language is disproved by approaching the false value. 

8. (Original) The method as set forth in claim 7 further includes setting a threshold 
value indicating that the null hypothesis is disproved. 

9. (Original) The method as set forth in claim 8 further includes repeating the extracting 
Step for a different text string from the document and repeating the method until the null 
hypothesis is disproved for one of the candidate languages by passing the threshold value. 

10. (Original) The method as set forth in claim 7 further includes pregenerating 
probability data corresponding to each candidate language, the probability data including 
a probability value for a text string that is normalized based on an occurrence probability 
of the text string in all the candidate languages. 

11. (Original) The method as set forth in claim 7 further includes identifying the 
document based on a search request. 

12. (Original) The method as set forth in claim 7 wherein the extracting step includes 
extracting * plurality of sequential characters that form the text string. 

13. (Orighal) The method as set forth in claim 7 wherein the setting step includes setting 
the true value to 1 and setting the false value to 0. 

14. (Origi lal) The method as set forth in claim 7 wherein the contrary probability for a 
first candidate language is determined based on a number of occurrences of the text string 
found in a sample set of documents from the first candidate language which is normalized 
by a sum c f occurrences of the text string found in a sample set of documents from all the 
candidate languages. 
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15. (Curreitly Amended) A process of determining that a document is in a selected 
language, the process comprising the steps of: 

setting a probability assumption indicating that the document [[inj] is not in the 
selected language; 

extnicting a character string from the document; and 

disproving the probability assumption based on a contrary probability that the 
character Miing does not belong to the selected language such that if the contrary 
probability fails to support the probability assumption, then the document is determined 
as being in the selected language. 

16. (Original) The process as set forth in claim 15 further includes determining the 
document : s the selected language from a set of candidate languages. 

17. (Original) The process as set forth in claim 16 further including generating a 
probability database having a contrary probability for each of a plurality of character 
strings for each of the candidate languages, where the contrary probability of a character 
string in oie language is determined based on an occurrence frequency of the character 
string in the one language influenced by a total occurrence frequency of the character 
string in al I the candidate languages. 

18. (Origiial) The process as set forth in claim 17 further including detennining the 
occurrence frequency of each character string based on a sample set of documents 
provided f >r each of the candidate languages. 

19. (Original) The process as set forth in claim 17 wherein the contrary probability of the 
character s iring in one language is normalized by the total occurrence frequency of the 
character string in all the candidate languages. 

20. (Original) The process as set forth in claim 15 further including identifying the 
document in response to a search request. 
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21 . (New) A computer program product configured to perform the process of claim 15. 
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